# Efficient Deployment

Orpheus 3b 0.1 Ft Q4 K M GGUF
Apache-2.0
GGUF quantized version of Orpheus-3B-0.1-FT, suitable for efficient inference
Large Language Model English
O
freddyaboulton
30
1
Deepseek R1 Medical COT GGUF
Apache-2.0
DeepSeek-R1-Medical-COT is a Chain-of-Thought reasoning model specialized in the medical field, offering multiple quantized versions to accommodate different hardware requirements.
Large Language Model English
D
tensorblock
180
1
Qwen2.5 VL 7B Instruct FP8 Dynamic
Apache-2.0
The FP8 quantized version of Qwen2.5-VL-7B-Instruct, supporting efficient vision-text inference through vLLM
Text-to-Image Transformers English
Q
RedHatAI
25.18k
1
Deepseek R1 Distill Llama 70B FP8 Dynamic
MIT
The FP8 quantized version of DeepSeek-R1-Distill-Llama-70B, which optimizes inference performance by reducing the number of bits of weights and activations.
Large Language Model Transformers
D
RedHatAI
45.77k
9
Molmo 7B D 0924 NF4
Apache-2.0
The 4Bit quantized version of Molmo-7B-D-0924, which reduces VRAM usage through the NF4 quantization strategy and is suitable for environments with limited VRAM.
Image-to-Text Transformers
M
Scoolar
1,259
1
Pixtral 12b FP8 Dynamic
Apache-2.0
pixtral-12b-FP8-dynamic is a quantized version of mistral-community/pixtral-12b. By quantizing weights and activations to the FP8 data type, it reduces disk size and GPU memory requirements by approximately 50%. It is suitable for commercial and research purposes in multiple languages.
Text-to-Image Safetensors Supports Multiple Languages
P
RedHatAI
87.31k
9
QQQ Llama 3 8b G128
MIT
This is a version of the Llama-3-8b model quantized to INT4, using the QQQ quantization technique with a group size of 128 and optimized for hardware.
Large Language Model Transformers
Q
HandH1998
1,708
2
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase